Exclusion regions and their power. 
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The question of exclusion region construction in new phenomenon searches has been causing 
considerable discussions for many years and yet no clear mathematical definition of the problem 
has been stated so far. In this paper we formulate the problem in mathematical terms and propose 
a solution to the problem within the framework of statistical tests. The proposed solution avoids 
problems of the currently used procedures. 
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I. INTRODUCTION 

When existence of a new phenomenon is proposed an 
experiment is designed which exploits the differences be- 
tween the adopted (old) and the new theories to check if 
there is evidence to reject the old theory in favor of the 
new one. It is this difference which provides the signal 
in the experiment. If such a signal is found, a discovery 
is claimed and the values of the parameters of the new 
theory are measured. If, on the other hand, no evidence 
contradicting the old theory is found, it is desirable to 
set a constraint on the possible values of the parameters 
of the new theory. The logic behind this is simple: if the 
values of the parameters of the new theory were inside 
a certain region of the parameter space, the experiment 
would have found evidence against the old theory in fa- 
vor of the new one. Since no evidence is actually found, 
such a region is called the exclusion region. 

Traditionally, the exclusion regions on the parame- 
ters of a theory are constructed based on the upper 
boundary of a classical one-sided confidence interval. Of- 
ten, the exclusion regions constructed by one experi- 
ment rule out signals reported by the others (see, for 
instance, CDMS Q and DAMA H or LSND and 
KARMEN 0). Therefore, the task of confidence interval 
construction receives considerable attention and is a sub- 
ject of many controversies (see, for instance, T, 's', '3, 's', 
lEIHIS])' There are, however, serious problems associated 
with the use of confidence intervals for exclusion region 
construction which are often overlooked. Indeed, one of 
the pre-requisites of the theory of statistical estimation 
based on the classical theory of probabilities ^| is the 
knowledge that the observed data have arisen from the 
phenomenon being observed. In other words, it is sup- 
posed that there is no question whether the observed data 
X were drawn from the probability distribution pi{x] /i); 
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it is known for a fact and an attempt is being made to 
quantify the value of the parameter fj, by constructing a 
one-sided confidence interval. 

Hence, it is immediately seen that the classical theory 
of estimation is not applicable to the situations when it 
is not known that the phenomenon exists. The applica- 
tion of the theory to such problems may lead to intervals 
which do not have the desired confidence level. Another 
problem is that a confidence interval constructed in such 
a way may exclude values of the parameter /x for which 
the experiment is insensitive (see, for instance, discus- 
sions in 

Thus, the dissatisfaction with the classical theory of 
estimation is not due to imperfections in the theory but 
is caused by misuse of the theory and by lack of mathe- 
matical clarity in problems solutions of which are sought 
within the framework of the classical estimation theory. 
Yet, it is desirable to be able to construct exclusion re- 
gions objectively without the use of subjective priors. 

In this work we formulate the question of what can be 
stated regarding the parameter /i of the hypothesis Hi (/j,) 
of presence of the new phenomenon when the hypothe- 
sis Hq of absence of the new phenomenon is not rejected 
based on the outcome of the experiment. We also pro- 
pose a solution to this problem formulated within the 
framework of hypothesis test formalism. In addition, we 
propose a clear definition of the sensitivity of a detector. 



II. STATEMENT OF THE PROBLEM AND ITS 
SOLUTION 

Consider an experiment searching for a new phe- 
nomenon where a decision of plausibility of existence of 
the new phenomenon is made with the help of a statis- 
tical test In such a test, the hypothesis tested (the 
null hypothesis Hq) is the adopted (old) theory with the 
alternative hypothesis Hi that the observed data is due 
to the new phenomenon. Each of the hypotheses defines 
a probability distribution of obtaining every possible out- 
come X of the experiment po{x) and pi{x) respectively. 
The test is set up so that if the observed data lie within 
some critical region Wc, the null hypothesis is rejected 
and is not rejected otherwise 
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The error leading to an unjust rejection of the nuU hy- 
pothesis is caUed the error of the first kind and is denoted 
by a. It is a common practice to construct the test in 
such a way as to guarantee that the error does not ex- 
ceed a preset value Uc called the level of significance. The 
power of the test (1 — (3) is defined as the probability of 
rejecting the null hypothesis when the alternative is true. 
In other words: 

ac> po{x)dx (1-/3)= / pi{x)dx 
Example 

Suppose that the observable X is distributed according 
to the Gaussian distribution with zero mean and known 
dispersion if the new phenomenon does not exist. If, 
however, the new phenomenon exists, the same observ- 
able X is distributed according to Gaussian with the 
same dispersion but positive mean ^. Thus, the hypothe- 
ses of the origin of the observed data x are : 

po = -=i=e-^^/-^ = -=L=e-(^-)V-^ ^>0 

The best critical region Til is defined as, x> Xc- That 
is, if the observed data point x is greater than Xc the null 
hypothesis of absence of the new phenomenon is rejected. 
Thus, in the proposed test significance and power are: 

(l_/3)^^^ / e-^^-^^'/^'^'dx 

The level of significance is often selected at ac = 1.35 • 
10~^ which corresponds to Xc = 3(j in this example. 

Suppose further that the value of the parameter ^ of 
the alternative hypothesis is large (say fi = 5a) and the 
alternative hypothesis is true. Since the existence of the 
new phenomenon is reported only if x > Xc is observed, 
the presence of the new phenomenon will be established 
with probability (1 — /?) = 0.997. If the alternative hy- 
pothesis is true, but the value of the parameter fj, is small 
(say II ~ la) the existence of the new phenomenon will 
be established only in 0.023 cases. Thus, it is hopeless to 
look for the new phenomenon using the constructed test 
if the value of fi is small. 

The general problem which is being addressed in this 
paper is the following. Given a critical region Wc con- 
structed for a test of a null hypothesis Hq with respect to 
a composite alternative hypothesis Hi {fj) with unknown 
value of /i, what kind of restriction can be set on admis- 
sible values of fi if, based on the outcome of the test, the 
null hypothesis is not rejected. Intuitively, one would 



state that the values of the parameter ^ of the alter- 
native hypothesis for which the null hypothesis can not 
be rejected reliably should not be excluded based on the 
non-rejection of the null hypothesis. 

To formulate this intuitive notion in mathematical 
terms it should be realized that the composite hypothe- 
sis Hi (/i) can be considered as a set of simple hypotheses 
Hi (n) corresponding to different fixed values of n which 
can be classified by the power of the test: 



(1-/3(m)) = / pi{x;fi)dx 



If the constructed test has low power with respect 
to the simple alternative hypothesis Hi{fi), it is not a 
surprise that no evidence against the null hypothesis is 
found. Therefore, if the null hypothesis is not rejected, 
the admissible values of the parameter /i for which the 
power of the test is small should not be excluded based 
on the outcome of the experiment. 

If however, the constructed test has a high power with 
respect to the simple alternative hypothesis Hi{^) and 
no evidence against the null hypothesis is found, it may 
be concluded that the admissible values of the parameter 
fj, for which the power of the test is higher than critical 
value (1 — /3c) can be ruled out as unlikely. The critical 
value (1 — /3c) of the power of the test is motivated by 
the problem at hand and should be selected at 90% or 
higher. The value fi'^ of the parameter fj, corresponding 
to the smallest acceptable power of the test (1 — /3c) at 
significance ac is the demarcation point (or demarcation 
hypersurface if /i is multi-dimensional) between the al- 
lowed and excluded regions of values of the parameter fj,. 
In the example considered above, the demarcation point 
corresponding to the power (1 — /3c) = 0.9 at significance 
of ac = 1.35 • 10~^ is fj,'^ = 4.3a. The values of fi greater 
than ii'^ should be considered as unlikely when the null 
hypothesis is not rejected. 

Based on the preceding discussion, it is seen that it 
is the power of the test which needs to be maximized 
when constructing experiments. There are several ways 
to achieve this. One way to increase the power is to set 
a less stringent level of significance (decrease Xc in the 
example considered) which comes at a price of increased 
probability to falsely claim a discovery. The other, per- 
haps more desirable way to increase the power is by fun- 
damental modification of the experimental setup. Such 
modification can be made keeping the significance level 
intact but may increase the operation cost. Examples of 
this approach are increased observation time or sample 
size. In the example considered here, the decrease in the 
dispersion a^ will increase the efficacy of the experiment 
with respect to weak signals keeping the significance level 
intact. 
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III. EXPERIMENT SENSITIVITY 

Another question which needs to be addressed is that 
of sensitivity of an experiment and what it means. Even 
though sensitivity is usually interpreted as the signal 
strength which a detector is able to detect, this statement 
lacks definiteness because the detection is a statistical 
process. Due to statistical fluctuations a strong signal 
might be missed and a weak signal might be detected. 
Thus, the question of sensitivity of the experiment has 
to be addressed within the framework of statistical tests 
as well. Therefore, the question is: given a critical region 
Wc constructed for a test of a null hypothesis Hq what 
is the efficacy of the test with respect to a set of simple 
alternative hypotheses Hi{^). 

The answer, once again, can be found in terms of sig- 
nificance and power. It is reasonable to request that any 
apparatus to be constructed should have a chance of sig- 
nal detection of 50% or more with given level of signifi- 
cance. That is why it is proposed to quote the sensitivity 
of a detector as such signal level that would provide at 
least 50% power of the test at the specified level of sig- 
nificance and should not be regarded as "absolute" 100% 
detection level. In the example considered above, the 
sensitivity of the experiment is /i = with significance 
1.35-10-3. 

At this point it is important to note that two identical 
experiments looking at identical signal at their sensitiv- 
ity level may provide drastically different outcomes. One 
of the experiments may get lucky and state a discov- 
ery of the phenomenon while the other one may not. It 
should be stressed that based on that it is not possible 
to conclude that outcome of one experiment rules out the 
signal of the other one; there is no contradiction between 
the two. In order to confirm or refute a signal detec- 
tion made by an experiment at its sensitivity level, it is 
required to conduct a new test which would have appre- 
ciable (90% or more) power with respect to the claimed 
signal strength with pre-specified significance. Return- 
ing to the considered example, to confirm the discovery 
made on this experiment a new test would have to be 
built with the new dispersion (T^gj„ = OASa^i^ with the 
same significance of 1.35 • 10"^. 

IV. POISSON PROCESS WITH KNOWN 
BACKGROUND 

The case of tremendous practical importance is when 
the number n of observed events is distributed according 
to the Poisson distribution 

p{n;fJ') = —re"'" 
n! 

The experiment searching for a new phenomenon may 
be a subject to background so that 




2 4 6 S 10 12 14 16 18 20 



FIG. 1: Signal with average strength > iJ-a can be detected 
with corresponding probabilities of at least 50% 90% and 99% 
at significance 1.35 ■ lO""^ in the presence of known average 
background /ij. 
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FIG. 2: The upper end /i" of the 99% confidence level inter- 
val for Us from tables VIII, IX can be anywhere between 
the dashed and dotted lines. If the confidence interval were 
used to construct the exclusion region, the signals inside the 
region which are below the solid line could be detected with 
probability less than 99% at significance 1.35 ■ 10"'^. 

where /i;, > is the average background rate and /is > 0. 

If the average background rate fi^ is known, the best 
critical region against the stated alternative is con- 
structed by 

oo 

n > Tic etc > po{k,fib) = Pjucfib) 

k — Tlc 

where P{x, n) is complementary regularized incomplete 
gamma function. 

Thus, if no evidence against the null hypothesis is 
found, the demarcation point /^^ on the values of /i^ cor- 
responding to the power (1 — (3c) at significance Uc can 
be constructed by finding the value of /i^ such that: 

oo 
k—ric 



Figure H illustrates the situation for the significance 
of 1.35 • lO-'^ and different requested powers of the test. 
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It can be seen that even with zero average background 
events expected, the signal can be rehably detected (with 
99% probabihty) only if its average rate is above fig > 
4.61. The surprisingly high value of the signal is due to 
discrete nature of the Poisson distribution. 

It might be interesting to visualize what would hap- 
pen if the 99% confidence interval [0; proposed in Q 
were used for the exclusion region construction. Because 
the interval depends on the number of observed events, 
the boundary may be anywhere between the dashed 
and dotted lines on the figure [3 (The dotted line corre- 
sponds to the most confining interval when zero events is 
observed while the dashed line represents the longest con- 
fidence interval obtainable with the left boundary fixed 
to zero. The figure is produced from the tables VIII and 
IX from [^.) The values above the boundary /i" would 
be excluded with the confidence of 99%. It is seen that 
signals inside the exclusion region constructed based on 
the 99% confidence interval |5j| which are below the solid 
line will be detected with probability much smaller than 
99% at significance 1.35 • 10"^ 

V. CONCLUSION 

In this report we have considered a problem of what 
can be stated regarding the parameter fi of alternative 
hypothesis Hi (/x) of presence of a new phenomenon when 
no evidence against the null hypothesis Hq of absence 
of the new phenomenon is found. We have proposed a 
mathematical formulation of this problem and its solu- 
tion within the framework of hypothesis tests theory |llj | . 
We have also given reasons why the classical theory of 
estimation |lClj | is not applicable in situations when the 
origin of data is questioned. 

Nevertheless, we recommend to continue to report the 
classical confidence intervals assuming that the sought for 
new phenomenon exists for at least two reasons. First, 
the confidence intervals constructed now may be vali- 



dated by a future experiment which will discover the ex- 
istence of the new phenomenon. Second, the classical 
confidence interval provides information to future exper- 
iments about what the value of the parameter might be. 

However, we propose to discontinue the use of the clas- 
sical confidence intervals for construction of exclusion re- 
gions when no evidence against the hypothesis of absence 
of the new phenomenon is found. Instead, we propose to 
construct the exclusion regions based on the power of 
the test, since if the undiscovered process existed with 
the parameter inside the exclusion region it would have 
been discovered with probability (1 — /3c) or higher at sig- 
nificance Qfc- Other attractive features of the constructed 
exclusion region are that less powerful experiments will 
produce less confining exclusion regions, the exclusion 
regions do not shrink if the number of observed events 
is less than the average expected background; the proce- 
dure for exclusion region construction avoids problems at 
physical boundaries on the parameter values and does not 
exclude the values of the parameter for which the exper- 
iment is insensitive. Also, the procedure of the exclusion 
region construction outlined in this paper resolves the il- 
lusory contradiction between the opposite results of two 
independent observations made at the sensitivity level of 
a detector. 

It is proposed to call the detector sensitive if at the 
specified level of significance at least 50% power of the 
test can be achieved. 
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